Terminology Finite-State Preprocessing for Computational LFG

نویسنده

  • Caroline Brun
چکیده

This paper presents a technique to deal with multiword nominal terminology in a computational Lexical Functional Grammar. This method treats multiword terms as single tokens by modifying the preprocessing stage of the grammar (tokenization and morphological analysis), which consists of a cascade of two-level finite-state automata (transducers). We present here how we build the transducers to take terminology into account. We tested the method by parsing a small corpus with and without this treatment of multiword terms. The number of parses and parsing time decrease without affecting the relevance of the results. Moreover, the method improves the perspicuity of the analyses. 1 I n t r o d u c t i o n The general issue we are dealing with here is to determine whether there is an advantage to treating multiword expressions as single tokens, by recognizing them before parsing. Possible advantages are the reduction of ambiguity in the parse results, perspicuity in the structure of analyses, and reduction in parsing time. The possible disadvantage is the loss of valid analyses. There is probably no single answer to this issue, as there are many different kinds of multiword expressions. This work follows the integration 1 of (French) fixed multiword expressions like a priori, and time expressions, like le 12janvier 1988, in the preprocessing stage. Terminology is an interesting kind of multiword expressions because such expressions are almost but not completely fixed, and there is an intuition that you won't loose many good anal~This integration has been done by Fr6d~rique Segond. yses by treating them as single tokens. Moreover, terminology can be semi or fully automatically extracted. Our goal in the present paper is to compare efficiency and syntactic coverage of a French LFG grammar on a technical text, with and without terminology recognition in the preprocessing stage. The preprocessing consists mainly in two stages: tokenization and morphological analysis. Both stages are performed by use of finite-state lexical transducers (Kartunnen, 1994). In the following, we describe the insertion of terminology in these finite-state transducers, as well as the consequences of such an insertion on the syntactic analysis, in terms of number of valid analyses produced, parsing time and nature of the results. We are part of a project, which aims at developing LFG grammars, (Bresnan and Kaplan, 1982), in parallel for French, English and German, (Butt et al., To appear). The grammar is developed in a computational environment called XLE (Xerox Linguistic Environment), (Maxwell and Kaplan, 1996), which provides automatic parsing and generation, as well as an interface to the preprocessing tools we are describing. 2 Terminology Extraction The first stage of this work was to extract terminology from our corpus. This corpus is a small French technical text of 742 sentences (7000 words). As we have at our disposal parallel aligned English/French texts, we use the English translation to decide when a potential term is actually a term. The terminology we are dealing with is mainly nominal. To perform this extraction task, we use a tagger (Chanod and Tapanainen, 1995) to disambiguate the French text, and then extract the following syntactic patterns, N Prep N, N N , N A, A N, which are good candidates to be terms. These candidates

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Finite-state Technology with Deep LFG Grammars1

Researchers at PARC were pioneers in developing finite-state methods for applications in computational linguistics, and one of the original motivations was to provide a coherent architecture for the integration of lower-level lexical processing with higher-level syntactic analysis (Kaplan and Kay, 1981; Karttunen et al., 1992; Kaplan and Kay, 1994). Finite-state methods for tokenizing and morph...

متن کامل

Valency Change and Complex Predicates in Wolof: an Lfg Account

This paper presents an LFG-based analysis of Wolof valency-changing suffixes found in applicative and causative constructions. The analysis addresses the particular issue of applicative-causative polysemy in this language. Similar to the work for Indonesian (Arka et al., 2009), I adopt an LFG-based predicate composition approach of complex predicate formation (Alsina, 1996; Butt, 1995), and ext...

متن کامل

Reduction of Computational Complexity in Finite State Automata Explosion of Networked System Diagnosis (RESEARCH NOTE)

This research puts forward rough finite state automata which have been represented by two variants of BDD called ROBDD and ZBDD. The proposed structures have been used in networked system diagnosis and can overcome cominatorial explosion. In implementation the CUDD - Colorado University Decision Diagrams package is used. A mathematical proof for claimed complexity are provided which shows ZBDD ...

متن کامل

Parsing Modern Greek verb MWEs with LFG/XLE grammars

We report on the first, still on-going effort to integrate verb MWEs in an LFG grammar of Modern Greek (MG). Text is lemmatized and tagged with the ILSP FBT Tagger and is fed to a MWE filter that marks Words_With_Spaces in MWEs. The output is then formatted to feed an LFG/XLE grammar that has been developed independently. So far we have identified and classified about 2500 MWEs, and have proces...

متن کامل

Transliterating Urdu for a Broad-Coverage Urdu/Hindi LFG Grammar

In this paper, we present a system for transliterating the Arabic-based script of Urdu to a Roman transliteration scheme. The system is integrated into a larger system consisting of a morphology module, implemented via finite state technologies, and a computational LFG grammar of Urdu that was developed with the grammar development platform XLE (Crouch et al. 2008). Our long-term goal is to han...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998